Previous works (Donahue et al., 2018a;Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
In Novel Class Discovery (NCD), the goal is to find new classes in an unlabeled set given a labeled set of known but different classes. While NCD has recently gained attention from the community, no framework has yet been proposed for heterogeneous tabular data, despite being a very common representation of data. In this paper, we propose TabularNCD, a new method for discovering novel classes in tabular data. We show a way to extract knowledge from already known classes to guide the discovery process of novel classes in the context of tabular data which contains heterogeneous variables. A part of this process is done by a new method for defining pseudo labels, and we follow recent findings in Multi-Task Learning to optimize a joint objective function. Our method demonstrates that NCD is not only applicable to images but also to heterogeneous tabular data.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
Occluded person re-identification (ReID) is a person retrieval task which aims at matching occluded person images with holistic ones. For addressing occluded ReID, part-based methods have been shown beneficial as they offer fine-grained information and are well suited to represent partially visible human bodies. However, training a part-based model is a challenging task for two reasons. Firstly, individual body part appearance is not as discriminative as global appearance (two distinct IDs might have the same local appearance), this means standard ReID training objectives using identity labels are not adapted to local feature learning. Secondly, ReID datasets are not provided with human topographical annotations. In this work, we propose BPBreID, a body part-based ReID model for solving the above issues. We first design two modules for predicting body part attention maps and producing body part-based features of the ReID target. We then propose GiLt, a novel training scheme for learning part-based representations that is robust to occlusions and non-discriminative local appearance. Extensive experiments on popular holistic and occluded datasets show the effectiveness of our proposed method, which outperforms state-of-the-art methods by 0.7% mAP and 5.6% rank-1 accuracy on the challenging Occluded-Duke dataset. Our code is available at https://github.com/VlSomers/bpbreid.
translated by 谷歌翻译
To apply federated learning to drug discovery we developed a novel platform in the context of European Innovative Medicines Initiative (IMI) project MELLODDY (grant n{\deg}831472), which was comprised of 10 pharmaceutical companies, academic research labs, large industrial companies and startups. The MELLODDY platform was the first industry-scale platform to enable the creation of a global federated model for drug discovery without sharing the confidential data sets of the individual partners. The federated model was trained on the platform by aggregating the gradients of all contributing partners in a cryptographic, secure way following each training iteration. The platform was deployed on an Amazon Web Services (AWS) multi-account architecture running Kubernetes clusters in private subnets. Organisationally, the roles of the different partners were codified as different rights and permissions on the platform and administrated in a decentralized way. The MELLODDY platform generated new scientific discoveries which are described in a companion paper.
translated by 谷歌翻译
我们展示了一个端到端框架,以提高人造系统对不可预见的事件的弹性。该框架基于基于物理的数字双胞胎模型和三个负责实时故障诊断,预后和重新配置的模块。故障诊断模块使用基于模型的诊断算法来检测和分离断层,并在系统中产生干预措施,以消除不确定的诊断解决方案。我们通过使用基于物理学的数字双胞胎的平行化和替代模型来扩展故障诊断算法为所需的实时性能。预后模块跟踪故障进度,并训练在线退化模型,以计算系统组件的剩余使用寿命。此外,我们使用降解模型来评估断层进程对操作要求的影响。重新配置模块使用基于PDDL的计划,并带有语义附件来调整系统控件,从而最大程度地减少了对系统操作的故障影响。我们定义一个弹性度量,并以燃料系统模型的示例来说明该指标如何通过我们的框架改进。
translated by 谷歌翻译
在这项工作中,研究了来自磁共振图像的脑年龄预测的深度学习技术,旨在帮助鉴定天然老化过程的生物标志物。生物标志物的鉴定可用于检测早期神经变性过程,以及预测与年龄相关或与非年龄相关的认知下降。在这项工作中实施并比较了两种技术:应用于体积图像的3D卷积神经网络和应用于从轴向平面的切片的2D卷积神经网络,随后融合各个预测。通过2D模型获得的最佳结果,其达到了3.83年的平均绝对误差。 - Neste Trabalho S \〜AO InvestigaDAS T \'Ecnicas de Aprendizado Profundo Para a previ \ c {c} \〜ate daade脑电站a partir de imagens de resson \ ^ ancia magn \'etica,Visando辅助Na Identifica \ c {C} \〜AO de BioMarcadores Do Processo Natural de Envelhecimento。一个identifica \ c {c} \〜ao de bioMarcarcores \'e \'util para a detec \ c {c} \〜ao de um processo neurodegenerativo em Est \'Agio无数,Al \'em de possibilitar Prever Um decl 'inio cognitivo relacionado ou n \〜ao \`一个懒惰。 Duas T \'ECICAS S \〜AO ImportyAdas E Comparadas Teste Trabalho:Uma Rede神经卷应3D APLICADA NA IMAGEM VOLUM \'ETRICA E UME REDE神经卷轴2D APLICADA A FATIAS DO PANIAS轴向,COM后面fus \〜AO DAS PREDI \ C {c} \ \ oes个人。 o Melhor ResultAdo Foi optido Pelo Modelo 2D,Que Alcan \ C {C} OU UM ERRO M \'EDIO ABSOLUTO DE 3.83 ANOS。
translated by 谷歌翻译
显着对象检测(SOD)在图像分析中具有若干应用。基于深度学习的SOD方法是最有效的,但它们可能会错过具有相似颜色的前景部分。为了规避问题,我们介绍了一个后处理方法,名为\ Texit {SuperPixel Materionity}(Sess)的后期处理方法,其交替地执行两个操作,以便显着完成:基于对象的SuperPixel分段和基于SuperPixel的显着性估算。 Sess使用输入显着图来估算超像素描绘的种子,并在前景和背景中定义超顶盒查询。新的显着性图是由查询和超像素之间的颜色相似性产生的。对于给定数量的迭代的过程重复,使得所有产生的显着性图通过蜂窝自动机组合成单个。最后,使用其平均值合并后处理和初始映射。我们展示SES可以始终如一地,并在五个图像数据集上一致而大大提高三种基于深度学习的SOD方法的结果。
translated by 谷歌翻译
Since a lexicon-based approach is more elegant scientifically, explaining the solution components and being easier to generalize to other applications, this paper provides a new approach for offensive language and hate speech detection on social media. Our approach embodies a lexicon of implicit and explicit offensive and swearing expressions annotated with contextual information. Due to the severity of the social media abusive comments in Brazil, and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate the models. Nevertheless, our method may be applied to any other language. The conducted experiments show the effectiveness of the proposed approach, outperforming the current baseline methods for the Portuguese language.
translated by 谷歌翻译